Journal of Medical Imaging
● SPIE-Intl Soc Optical Eng
Preprints posted in the last 7 days, ranked by how well they match Journal of Medical Imaging's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Sivakumar, E.; Anand, A.
Show abstract
Computer vision and deep learning techniques, including convolutional neural networks (CNNs) and transformers, have increased the performance of medical image classification systems. However, training deep learning models using medical images is a challenging task that necessitates a substantial amount of annotated data. In this paper, we implement data augmentation strategies to tackle dataset imbalance in the VinDr-SpineXR dataset, which has a lower number of spine abnormality X-ray images compared to normal spine X-ray images. Geometric transformations and synthetic image generation using Generative Adversarial Networks are explored and applied to the abnormal classes of the dataset, and classifier performance is validated using VGG-16 and InceptionNet to identify the most effective augmentation technique. Additionally, we introduce a hybrid augmentation technique that addresses class imbalance, reduces computational overhead relative to a GAN-only approach, and achieves ~99% validation accuracy with both classifiers across all three case studies. Keywords: Data augmentation, Generative Adversarial Network, VGG-16, InceptionNet, Class imbalance, Computer vision, Spine X-ray, Radiology.
Wang, S.; Ayubcha, C.; Hua, Y.; Beam, A.
Show abstract
Background: Developing generalizable neuroimaging models is often hindered by limited labeled data which has led to an increased interest in unsupervised inverse learning. Existing approaches often neglect geometric principles and struggle with diverse pathologies. We propose a symmetry-informed inverse learning foundation model to address these shortcomings for robust and efficient anomaly detection in brain MRI. Methods: Our framework employs a reconstruction-to-embedding pipeline, trained exclusively on healthy brain MRI slices. A 2D U-Net uses a novel, symmetry-aware masking strategy to reconstruct a disorder-free slice. Difference maps are embedded into a 1024-dimensional latent space via a Beta-VAE. Anomaly scoring is performed using Mahalanobis distance. We evaluated generalization by fine-tuning on external lesion datasets, BraTS Africa (SSA), and the ADNI-derived Alzheimer disease cohort (Alz). Results: On the source metastasis (Mets) dataset, the framework achieved high performance (AB1+MSE: 99.28% accuracy, 99.79% sensitivity). Generalization to the external lesion dataset (SSA) was robust, with the Symmetry ROC configuration achieving 91.93% accuracy. Transfer to the Alzheimer dataset (Alz) was more challenging, achieving a peak accuracy of 70.54% with a high false-positive rate, suggesting difficulty in separating subtle, diffuse changes. Conclusion: The symmetry-informed inverse learning framework establishes a robust foundation model for neuroimaging, showing strong performance for focal lesions and successful generalization under domain shift. Limitations in diffuse neurodegeneration underscore the necessity for richer representations and multimodal integration to improve future foundation models.
Brito-Pacheco, D. A.; Giannopoulos, P.; Reyes-Aldasoro, C. C.
Show abstract
In this work, the impact of outliers on the performance of machine learning and deep learning models is investigated, specifically for the case of histopathological images of colorectal cancer stained with Haematoxylin and Eosin. The evaluation of the impact is done through the systematic comparison of one machine learning model (Random Forests) and one deep learning model (ResNet-18). Both models were trained with the popular NCT-CRC-HE-VAL-100K dataset and tested on the CRC-HE-VAL-7K companion set. Then, a curation process was performed by analysing the divergence of patches based on chromatic, textural and topological features of the training set and removing outliers to repeat the training with a cleaned dataset. The results showed that machine learning models, can benefit more from improvements in the quality of data, than deep learning models. Further, the results suggest that deep learning models are more robust to outliers as, through the training process, the architectures can learn features other than those previously mentioned.
Aquaro, G. D.; Licordari, R.; De Gori, C.; Todiere, G.; Ianni, U.; Barison, A.; De Luca, A.; Folgheraiter, a.; Grigoratos, C.; alberti, m.; lombardo, m.; De Caterina, R.; Sinagra, G.; Emdin, M.; Di Bella, G.; fulceri, l.
Show abstract
Background: Late gadolinium enhancement (LGE) quantification by cardiovascular magnetic resonance is central to risk stratification in hypertrophic cardiomyopathy (HCM), yet conventional techniques require contour tracing and region-of-interest (ROI) placement, which may reduce reproducibility and increase analysis time. We developed a novel visual standardized approach, the Visual Standardized Quantification of LGE (VISTAQ), that does not require myocardial contouring, arbitrary ROI positioning, or dedicated post-processing software. Methods: In this multicenter, multivendor retrospective study, LGE images from 400 patients (100 prior myocardial infarction, 250 HCM, 50 other non-ischemic heart diseases) were analyzed. VISTAQ subdivides each myocardial segment into transmural mini-segments and classifies LGE visually using predefined criteria, expressing global LGE burden as the percentage of positive mini-segments. Reproducibility was assessed in 250 patients across different observer expertise levels using intraclass correlation coefficients (ICC) and Bland?Altman analysis. In 100 HCM patients, VISTAQ was compared with conventional methods (mean+2SD, +5SD, +6SD, FWHM, visual thresholding). Prognostic performance was evaluated in 250 HCM patients over a median 5-year follow-up. Results: VISTAQ demonstrated excellent intra- and inter-observer reproducibility (ICC up to 0.98 and 0.97, respectively), consistent across disease subtypes. Compared with conventional techniques, VISTAQ showed similar ICC to FWHM but significantly lower net and absolute inter-observer differences (median absolute difference 1.3%). Mean+2SD markedly overestimated LGE, whereas mean+6SD slightly underestimated LGE compared with VISTAQ, mean+5SD, FWHM, and visual thresholding. Analysis time was substantially shorter with VISTAQ (median 105 vs. 375 seconds, p<0.0001). During follow-up, 21 hard cardiac events occurred in HCM population. An LGE threshold >10% predicted events with higher accuracy using VISTAQ (AUC 0.90; sensitivity 85%; specificity 94%) compared with mean+6SD (AUC 0.75; sensitivity 57%; specificity 93%). Conclusions: VISTAQ provides highly reproducible, time-efficient LGE quantification without dedicated software and demonstrates non-inferior prognostic discrimination in HCM compared with conventional threshold-based techniques.
Adeluwoye, A. O.; Gbadegesin, M. O.; James, F. M.; Otegbade, P. S.; Alabetutu, A.
Show abstract
Digital pathology, coupled with advanced image recognition algorithms, represents a transformative frontier in histopathological diagnosis. This sub-Saharan African laboratorys exploratory study investigates the application of a Convolutional Neural Network (CNN) model, specifically leveraging the VGG16 architecture with transfer learning, for automated analysis and classification of selected gastrointestinal (GIT) and liver tissue samples, incorporating both routine and specialized staining protocols. The study utilized a dataset comprising 114 samples (18 liver, 96 GIT images) derived from archival formalin-fixed paraffin-embedded tissue blocks at University College Hospital, Ibadan, Nigeria. Specialized staining techniques included Alcian Yellow for GIT mucin visualization and Massons Trichrome for liver fibrosis assessment, alongside conventional H&E staining. Model performance was evaluated using statistical methodologies including Wilson Score confidence intervals (CI), Bayesian probability assessment, and effect size analysis. Results reveal a striking dichotomy in model performance. The GIT tissue model achieved perfect classification accuracy (100% test accuracy) with exceptional statistical significance (Z=10.0, p<0.0001), Wilson CI [96.29%, 99.99%], Cohens h=1.571, and Bayesian probability >99.99%. Conversely, the liver tissue model demonstrated diagnostic failure (42.86% test accuracy), with Z=-1.428, p=0.9236, Wilson CI [33.59%, 52.65%], Cohens h=-0.144, and Bayesian probability of 7.64%. This performance divergence correlates with training data availability, as the liver dataset fell far below empirically established thresholds (>100-200 samples) for reliable classification. The liver models failure reveals limitations in transfer learning with insufficient data. These findings underscore critical implications for AI-enhanced digital pathology, demonstrating potential deployment of the GIT model as a promising one that supports tissue-specific model development.
Chandra, S.
Show abstract
Background: Current deep learning models in computational pathology, radiology, and digital pathology produce opaque predictions that lack the explainable artificial intelligence (xAI) capabilities required for clinical adoption. Despite achieving radiologist-level performance in tasks from whole-slide image (WSI) classification to mammographic screening, these models function as black boxes: clinicians cannot trace predictions to specific biological features, verify outputs against established morphological criteria, or integrate AI reasoning into precision oncology workflows and tumor board decision-making. Methods: We present Virtual Spectral Decomposition (VSD), a modality-agnostic, interpretable-by-design framework that decomposes medical images into six biologically interpretable tissue composition channels using sigmoid threshold functions - the same mathematical structure as CT windowing. Unlike post-hoc xAI methods (Grad-CAM, SHAP, LIME) applied to black-box deep learning models, VSD channels have pre-defined biological meanings derived from tissue physics, providing inherent explainability without sacrificing quantitative rigor. For whole-slide image (WSI) analysis in digital pathology, we introduce the dendritic tile selection algorithm, a biologically-inspired hierarchical architecture achieving 70-80% computational reduction while preferentially sampling the tumor immune microenvironment. VSD is validated across three cancer types and imaging modalities: pancreatic ductal adenocarcinoma (PDAC) on CT imaging, lung adenocarcinoma (LUAD) on H&E-stained pathology slides using TCGA data, and breast cancer on screening mammography. Composition entropy of the six-channel vector is computed as a visual Biological Entropy Index (vBEI) - an imaging biomarker quantifying the diversity of active biological defense systems. Results: In pancreatic cancer, the fat-to-stroma ratio (a novel CT-derived radiomics biomarker) declines from >5.0 (normal) to <0.5 (advanced PDAC), enabling early detection of desmoplastic invasion before mass formation on standard imaging. In lung cancer, composition entropy from H&E whole-slide images correlates with tumor immune microenvironment markers from RNA-seq (CD3: rho=+0.57, p=0.009; CD8: rho=+0.54, p=0.015; PD-1: rho=+0.54, p=0.013) and predicts overall survival (low entropy immune-desert phenotype: 71% mortality vs 29%, p=0.032; n=20 TCGA-LUAD), providing immune phenotyping for checkpoint immunotherapy patient selection from a $5 H&E slide without molecular assays. In breast cancer, each lesion type produces a characteristic six-channel fingerprint functioning as an interpretable computer-aided diagnosis (CAD) system for quantitative BI-RADS assessment and subtype classification (IDC vs ILC vs DCIS vs IBC). A five-level xAI audit trail provides complete traceability from clinical decision support output to specific biological structures visible on the original images. Conclusion: VSD establishes a unified, interpretable-by-design mathematical framework for explainable tissue composition analysis across imaging modalities and cancer types. Unlike black-box deep learning and post-hoc xAI approaches, VSD provides inherently interpretable, clinically verifiable cancer detection and immune phenotyping from standard clinical imaging at existing costs - without requiring foundation model infrastructure, specialized hardware, or molecular assays. The open-source pipeline (Google Colab, Supplementary Material) enables immediate reproducibility and extension to additional cancer types across the pan-cancer TCGA atlas.
Hou, J.; Yi, X.; Li, C.; Li, J.; Cao, H.; Lu, Q.; Yu, X.
Show abstract
Predicting response to induction chemotherapy (IC) and overall survival (OS) is critical for optimizing treatment in patients with locally advanced nasopharyngeal carcinoma (LANPC). This study aimed to develop and validate a multi-task deep learning model integrating pretreatment MRI and whole slide images (WSIs) to predict IC response and OS in LANPC. Pretreatment MRI and WSIs from 404 patients with LANPC were retrospectively collected to construct a multi-task model (MoEMIL) for the simultaneous prediction of early IC response and OS. MoEMIL employed multi-instance learning to process WSIs, PyRadiomics and a convolutional neural network (ResNet50) to extract MRI features, and fused multimodal features through a multi-gate mixture-of-experts architecture. Clustering-constrained attention multiple instance learning and gradient-weighted class activation mapping were applied for visualization and interpretation. MoEMIL effectively stratified patients into good and poor IC response groups, achieving areas under the curve of 0.917, 0.869, and 0.801 in the train, validation, and test sets, respectively, and outperformed the deep learning radiomics model, the pathomics model and TNM staging. The model also stratified patients into high- and low-risk OS groups (P < 0.05). MoEMIL shows promise as a decision-support tool for early IC response prediction and prognostication in LANPC. Author SummaryWe have developed a deep learning model that integrates two types of medical images, including magnetic resonance imaging (MRI) and digital pathological slices, to simultaneously predict response to induction chemotherapy and prognosis in patients with locally advanced nasopharyngeal carcinoma. Current treatment decisions primarily rely on traditional tumor staging (TNM), which often fails to comprehensively reflect the complexity of the disease. Our model, named MoEMIL, was trained and tested on data from 404 patients across two hospitals and consistently outperformed both single-model approaches and TNM staging methods. By identifying patients who exhibit poor response to induction chemotherapy or higher prognostic risk, our tool can assist clinicians in achieving personalized treatment, enabling intensified management for high-risk patients and avoiding unnecessary side effects for low-risk patients. Additionally, we visualize the models reasoning process through heat map generation, which highlights the image regions exerting the greatest influence on prediction outcomes. This work represents a step toward more precise treatment for nasopharyngeal carcinoma; however, larger-scale prospective studies are required before the model can be integrated into routine clinical practice.
Roca, M.; Messuti, G.; Klepachevskyi, D.; Angiolelli, M.; Bonavita, S.; Trojsi, F.; Demuru, M.; Troisi Lopez, E.; Chevallier, S.; Yger, F.; Saudargiene, A.; Sorrentino, P.; Corsi, M.-C.
Show abstract
Neurodegenerative diseases such as Mild Cognitive Impairment (MCI), Multiple Sclerosis (MS), Parkinson s Disease (PD), and Amyotrophic Lateral Sclerosis (ALS) are becoming more prevalent. Each of these diseases, despite its specific pathophysiological mechanisms, leads to widespread reorganization of brain activity. However, the corresponding neurophysiological signatures of these changes have been elusive. As a consequence, to date, it is not possible to effectively distinguish these diseases from neurophysiological data alone. This work uses Magnetoencephalography (MEG) resting-state data, combined with interpretable machine learning techniques, to support differential diagnosis. We expand on previous work and design a Riemannian geometry-based classification pipeline. The pipeline is fed with typical connectivity metrics, such as covariance or correlation matrices. To maintain interpretability while reducing feature dimensionality, we introduce a classifier-independent feature selection procedure that uses effect sizes derived from the Kruskal-Wallis test. The ensemble classification pipeline, called REDDI, achieved a mean balanced accuracy of 0.81 (+/-0.04) across five folds, representing a 13% improvement over the state-of-the-art, while remaining clinically transparent. As such, our approach achieves reliable, interpretable, data-driven, operator-independent decision-support tools in Neurology.
Gangolli, M.; Perkins, N. J.; Marinelli, L.; Basser, P. J.; Avram, A. V.
Show abstract
BACKGROUNDMild traumatic brain injury (mTBI) is a signature injury in civilian and military populations that remains invisible to detection by conventional radiological methods. Diffusion MRI has been identified as a potential clinical tool for revealing subtle microstructural alterations associated with mTBI. OBJECTIVEThis study evaluates whether a comprehensive and powerful diffusion MRI (dMRI) technique called mean apparent propagator (MAP) MRI can detect sequelae of mTBI. METHODSWe analyzed data from 417 participants of the GE/NFL prospective mTBI study which included 143 matched controls (mean age, 21.9 {+/-} 8.3 years; 76 women) and 274 patients with acute mTBI and GCS [≥]13 (mean age, 21.9 {+/-} 8.5 years; 131 women). All participants underwent MRI exams at up to four visits including structural high-resolution T1W, T2W, FLAIR-T2W, and dMRI, in addition to clinical assessments of post-concussive physical symptoms (RPQ-3), psychosocial functioning and lifestyle symptoms (RPQ-13), and postural stability (BESS). The dMRI data for each subject were co-registered across all visits and analyzed using the MAP-MRI framework to measure and map the distribution of net microscopic displacements of diffusing water molecules in tissue and ultimately compute the microstructural MAP-MRI tissue parameters including propagator anisotropy (PA), Non-Gaussianity (NG), return-to-origin probability (RTOP), return-to-axis probability (RTAP), and return-to-plane probability (RTPP). We quantified voxel-wise and region-of-interest (ROI)-based changes in these parameters across all four visits. RESULTSMAP-MRI parameter values were within the expected ranges and showed relatively little variation across visits. We found no significant differences in the longitudinal trajectories of these parameters between mTBI patients and controls. At acute post-injury timepoints, RPQ-3 and RPQ-13 scores were increased in mTBI patients relative to controls, while BESS scores were not significantly different between groups. Analysis of dMRI metrics and clinical mTBI markers showed significant correspondence between MAP-MRI metrics in cortical gray matter, caudate and pallidum and BESS scores. CONCLUSIONWe developed and tested a state-of-the-art quantitative image processing pipeline for sensitive analysis and detection of subtle tissue changes in longitudinal clinical diffusion MRI data. The absence of a significant statistical difference between populations in the dMRI parameters in this study suggests that the mTBI corresponded to acute post-injury clinical symptoms but that the injury was not severe enough to cause detectable microstructural damage/alterations, and that increased diffusion sensitization combined with improved analysis techniques may be needed. CLINICAL IMPACTThese findings suggest that acute mTBI (GCS[≥]13) may not be detectable with diffusion MRI. TRIAL REGISTRATIONClinicalTrials.gov NCT02556177
Tan, J.; Tang, P. H.
Show abstract
Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.
Ben-Joseph, J.
Show abstract
Lightweight epidemic calculators are widely used for teaching and rapid scenario exploration, yet many omit the methodological detail needed for scientific reuse. We present a browser-native SIR calculator that exposes forward Euler and classical fourth-order Runge--Kutta (RK4) integration alongside epidemiologically interpretable outputs and a population-conservation diagnostic. The implementation is anchored to analytical properties of the deterministic SIR system, including the epidemic threshold, the peak condition, and the final-size relation. Benchmark experiments show that RK4 is essentially step-size invariant over practical discretizations, whereas Euler at a coarse one-day step overestimates peak prevalence by 3.97% and final size by 0.66% relative to a fine-step RK4 reference. These results demonstrate that browser-based tools can support publication-quality computational narratives when solver choice, diagnostics, and assumptions are treated as first-class outputs.
Altinok, O.; Ho, W. L. J.; Robinson, L.; Goldgof, D.; Hall, L. O.; Guvenis, A.; Schabath, M. B.
Show abstract
Objectives: Among surgically resected non-small cell lung cancer (NSCLC) patients with similar stage and histopathological characteristics, there is variability in patient outcomes which highlights urgency of identifying biomarkers to predict recurrence. The goal of this study was to systematically develop a pre-surgical CT-based habitat-based radiomics classifier to predict recurrence-of-risk in NSCLC. Methods: This study included 293 NSCLC patients with surgically resected stage IA-IIIA disease that were randomly divided into a training (n = 195) and test cohorts (n = 98). From pre-surgical CT images, tumor habitats were generated using two-level unsupervised clustering and then radiomic features were calculated from the intratumoral region and habitat-defined subregions. Using ridge-regularized logistic regression, separate classifiers were developed to predict 3-year recurrence using intratumoral radiomics, habitat-based radiomics, and a combined model (intratumoral and habitat) which was generated using a stacked learning framework. For each classifier, probability of recurrence was calculated for each patient then numerous statistical and machine learning approaches were utilized to stratify patients for recurrence-free survival. Results: The combined radiomics classifier yielded a superior AUC (0.82) compared to the intratumoral (AUC = 0.75) and habitat radiomics (AUC = 0.81) models. When the classifiers were used to stratify high- versus low-risk patients utilizing a cut-point identified by decision tree analysis, high-risk patients were yielded the largest risk estimate (HR = 8.43; 95% CI 2.47 - 28.81) compared to the habitat (HR = 5.41; 95% CI 2.08 - 14.09) and intratumoral radiomics (HR = 3.54; 95% CI 1.45 - 8.66) models. SHAP analyses indicated that habitat-derived information contributed most strongly to recurrence prediction. Conclusions: This study revealed that habitat-based radiomics provided superior statistical performance than intratumoral radiomics for predicting recurrence in NSCLC.
Johansson, J.; Palonen, S.; Egorova, K.; Tuisku, J.; Harju, H.; Kärpijoki, H.; Maaniitty, T.; Saraste, A.; Saari, T.; Tuomola, N.; Rinne, J.; Nuutila, P.; Latva-Rasku, A.; Virtanen, K. A.; Knuuti, J.; Nummenmaa, L.
Show abstract
BackgroundQuantitative cerebral blood flow (CBF) measured with [15O]water positron emission tomography (PET) is the reference standard for quantifying brain perfusion. However, clinical interpretation of individual CBF measurements is limited by the absence of large normative datasets accounting for physiological variability across the adult lifespan. Long-axial field-of-view PET enables high-sensitivity quantitative [15O]water perfusion imaging without arterial blood sampling, allowing normative characterization of cerebral perfusion at unprecedented scale. The aim of this study was to establish normative and covariate-adjusted models of cerebral blood flow across the adult lifespan using total-body [15O]water PET. MethodsQuantitative CBF measurements were obtained in 302 neurologically healthy adults (age 21-86 years) using total-body [15O]water PET. Linear mixed-effects models were used to evaluate the effects of age, sex, body mass index (BMI), and blood hemoglobin concentration on CBF and to generate normative prediction models across the adult lifespan. Between-subject and within-subject variability were estimated from repeated scans in a subset of participants (n=51). ResultsMean grey matter CBF was 46.1 mL/(min*dL), with substantial inter-individual variability but high within-subject reproducibility (intraclass correlation coefficients 0.78-0.89). Advancing age was associated with a decline in CBF of approximately 7% per decade (p_FDR < 10-12). Higher BMI was associated with lower CBF (approximately -6% per 10 kg/m2; p_FDR < 0.01). Women exhibited higher CBF than men (approximately 7.5%), but this difference was largely explained by lower blood hemoglobin concentration in women. Covariate-adjusted models were used to generate normative predictions and prediction intervals describing expected CBF across adulthood. ConclusionThis study establishes a normative database of quantitative cerebral blood flow across the adult lifespan using high-sensitivity [15O]water PET. Age, BMI, and hemoglobin are major determinants of inter-individual variability in CBF. The resulting generative models provide a quantitative reference framework for interpreting cerebral perfusion measurements and may enable automated detection of abnormal brain perfusion in clinical PET imaging.
Stockbridge, M. D.; Faria, A. V.; Neal, V.; Diaz-Carr, I.; Soule, Z.; Ahmad, Y. B.; Khanduja, S.; Whitman, G.; Hillis, A. E.; Cho, S.-M.
Show abstract
The SAFE MRI ECMO (NCT05469139) study established the safety of ultra-low-field 64mT MRI in patients receiving extracorporeal membrane oxygenation (ECMO) in the setting of intensive care and demonstrated that these images were highly sensitive in detecting acquired brain injuries. This retrospective analysis of prospectively collected observational data sought to expand on these findings in light of the crucial need for neurological monitoring while patients receive ECMO by evaluating the feasibility of volumetric analyses derived from ultra-low-field MR images. T2-weighted scans from thirty patients who received ultra-low-field MRI while undergoing ECMO at Johns Hopkins Hospital were analyzed using a volumetric pipeline to determine whole brain volume and volumes of total grey matter, total white matter, subcortical grey matter, ventricles, left hemisphere, right hemisphere, telencephalon, left and right lateral ventricles, the total intracranial volume, and the cerebellum. Segmented brain volumes in patients undergoing ECMO were comparable to measurements obtained using conventional field and ultra-low-field MRI in the absence of ECMO instrumentation. The subgroup analysis demonstrated subtle volumetric differences between patients supported with venoarterial ECMO and those receiving venovenous ECMO. These data provide the first evidence that ultra-low-field MRI provides volumetric measurements comparable to conventional field-strength MRI, even in the presence of ECMO circuitry, supporting its feasibility for neuroimaging in critically ill patients.
Quigg, M.; Chernyavskiy, P.; Terrell, W.; Smetana, R.; Muttikal, T. E.; Wardius, M.; Kundu, B.
Show abstract
Background and Purpose: 2-[18F] fluoro-2-deoxy-D-glucose positron emission tomography (static PET) has mixed specificity and sensitivity in targeting epileptic zones in the noninvasive stage of epilepsy surgery evaluations. We compared the signal quality of static PET compared to a method of interictal dynamic PET (iD-PET). Materials and Methods: We calculated the signal quality of static PET and iD-PET obtained from a cohort of patients with focal epilepsy. We developed a Bayesian regional estimated signal quality (BRESQ) technique to objectively compare signal-to-noise ratios (SNRs) by region of interest (ROI) within subjects. Results: Adjusted for ROI size and neighboring regions, iDPET was superior to sPET with probability >95% in 8/36 regions; >90% in 21/36 regions; >80% in 29/36 regions. The top five regions with the largest adjusted SNR differences (greatest magnitude of iDPET superiority) were the Temporal Mesial (Left and Right), Occipital Lateral (Left and Right), and the Left Frontal Inferior Base. Conclusions: We found that iDPET yielded a superior SNR in most ROI. BRESQ offers a scalable and generalizable method to quantify signal quality between brain mapping modalities.
Zhang, Q.; Tang, Q.; Vu, T.; Pandit, K.; Cui, Y.; Yan, F.; Wang, N.; Li, J.; Yao, A.; Menozzi, L.; Fung, K.-M.; Yu, Z.; Parrack, P.; Ali, W.; Liu, R.; Wang, C.; Liu, J.; Hostetler, C. A.; Milam, A. N.; Nave, B.; Squires, R. A.; Battula, N. R.; Pan, C.; Martins, P. N.; Yao, J.
Show abstract
End-stage liver disease (ESLD) is one of the leading causes of death worldwide. Currently, the only curative option for patients with ESLD is liver transplantation. However, the demand for donor livers far exceeds the available supply, partly because many potentially viable livers are discarded following biopsy evaluation. While biopsy is the gold standard for assessing liver histological features related to graft quality and transplant suitability, it often leads to high discard rates due to its susceptibility to sampling errors and limited spatial coverage. Besides, biopsy is invasive, time-consuming, and unavailable in clinical facilities with limited resources. Here, we present an AI-assisted photoacoustic/ultrasound (PA/US) imaging framework for quantitative assessment of human donor liver graft quality and transplant suitablity at the whole-organ scale. With multimodal volumetric PA/US images as the input, our deep-learning (DL) model accurately predicted the risk level of fibrosis and steatosis, which indicate the graft quality and transplant suitability, when comparing with true pathological scores. DL also identified the imaging modes (PAI wavelength and B-mode USI) that correlated the most with prediction accuracy, without relying on ill-posed spectral unmixing. Our method was evaluated in six discarded human donor livers comprising sixty spatially matched regions of interest. Our study will pave the way for a new standard of care in organ graft quality and transplant suitability that is fast, noninvasive, and spatially thorough to prevent unnecessary organ discards in liver transplantation.
Chandra, S.
Show abstract
Background. Pancreatic ductal adenocarcinoma (PDAC) has a five-year survival rate of approximately 12%, largely because it is typically diagnosed at an advanced stage. CT-based computational methods for early detection exist but rely on black-box deep learning or large texture feature sets without tissue-specific interpretability. Methods. We developed Virtual Spectral Decomposition (VSD), which applies six parameterized sigmoid functions S(HU) = 1/(1+exp(-alpha x (HU - mu))) to standard portal-venous CT, decomposing each pixel into tissue-specific response channels for fat (mu=-60), fluid (mu=10), parenchyma (mu=45), stroma (mu=75), vascular (mu=130), and calcification (mu=250). Dendritic Binary Gating identifies structural content per channel using morphological filtering, enabling co-firing analysis and lone firer identification. A 25-feature signature was extracted per patient. Three independent datasets were analyzed: NIH Pancreas-CT (n=78 healthy), Medical Segmentation Decathlon Task07 (n=281 PDAC, paired tumor/adjacent tissue), and CPTAC-PDA from The Cancer Imaging Archive (n=82, multi-institutional, with DICOM time point tags). The same six sigmoid parameters were used across all datasets without retraining. Results. VSD achieved AUC 0.943 for field effect detection (healthy vs cancer-adjacent parenchyma) and AUC 0.931 for patient-stratified tumor specification on MSD. On CPTAC-PDA, VSD achieved AUC 0.961 (6 features) and 0.979 (25 features) for distinguishing healthy from cancer-bearing pancreas on scans obtained prior to pathological diagnosis. All significant features replicated across datasets in the same direction: z_fat (d=-2.10, p=3.5e-27), z_fluid (d=-2.76, p=2.4e-38), fire_fat (d=+2.18, p=1.2e-28). Critically, VSD severity did not correlate with days-from-diagnosis (r=-0.008, p=0.944) across a range of day -1394 to day +249. Patient C3N-01375, scanned 3.8 years before pathological diagnosis, had VSD severity 1.87, well above the healthy mean of 0.94 +/- 0.33. The tissue transformation signature was temporally stable, indicating an early, persistent tissue state rather than a progressively worsening process. Conclusions. VSD with Dendritic Binary Gating detects a stable pancreatic tissue composition signature on standard CT that is present years before clinical diagnosis, validated across three independent datasets without parameter adjustment. The six sigmoid channels map to biologically meaningful tissue components through a fully transparent interpretability chain. The temporal stability of the signal implies a detection window of 3-7 years, consistent with known PanIN-3 microenvironment transformation timelines. VSD functions as a single-scan screening tool applicable to any abdominal CT performed during the pre-clinical window.
Korni, A.; Zandi, E.
Show abstract
BackgroundPlasma biomarkers demonstrate strong within-cohort performance for identifying cerebral amyloid pathology, but their real-world clinical utility depends on generalization across populations and assay platforms. The impact of cross-cohort deployment on clinically actionable metrics such as negative predictive value (NPV) remains poorly characterized. ObjectiveTo evaluate the performance and portability of plasma biomarker-based machine learning models for amyloid PET prediction across independent cohorts, with emphasis on calibration and clinically relevant predictive values. MethodsData from ADNI (n=885) and A4 (n=822) were analyzed. Machine learning models were trained within each cohort to predict amyloid PET status and continuous amyloid burden (centiloids). Performance was assessed using ROC AUC, accuracy, R{superscript 2}, and RMSE. Cross-cohort generalizability was evaluated using bidirectional transfer without retraining. Calibration, predictive values, and decision curve analysis were used to assess clinical utility. ResultsWithin-cohort discrimination was high (AUC up to 0.913 in ADNI and 0.870 in A4), with moderate performance for centiloid prediction (R{superscript 2} up to 0.628 and 0.535, respectively). Cross-cohort deployment resulted in modest attenuation of AUC ([~]4-7%) but substantially greater degradation in clinically actionable performance. NPV declined from 0.831 to 0.644 under ADNI[->]A4 transfer ([~]19 percentage points) despite preserved discrimination. Calibration analyses demonstrated systematic probability misestimation, and decision curve analysis showed reduced net clinical benefit. Biomarker distribution differences across cohorts were consistent with dataset shift. ConclusionPlasma biomarker models retain discrimination across cohorts but exhibit clinically meaningful degradation in predictive value under deployment. Calibration instability and prevalence differences critically affect NPV, highlighting the need for cross-cohort validation, calibration assessment, and assay harmonization before clinical implementation.
Hakata, Y.; Oikawa, M.; Fujisawa, S.
Show abstract
Background. Adult diffuse glioma is a representative class of primary brain tumors for which accurate MRI-based tumor segmentation is indispensable for treatment planning. Conventional automated segmentation methods have relied primarily on image information and spatial prompts, and auxiliary clinical information that is routinely acquired in clinical practice has not been sufficiently exploited as an input. Objective. Building on a dual-prompt-driven Segment Anything Model (SAM) extension framework that fuses visual and language reference prompts, we propose a method that integrates patient demographics, unsupervised molecular cluster variables derived from TCGA high-throughput profiling, and histopathological parameters as learnable prompt embeddings, and we evaluate its effect on the accuracy of lower-grade glioma (LGG) MRI segmentation. Methods. An auxiliary prompt encoder converts clinical metadata into high-dimensional embeddings that are fused with the prompt representations of Segment Anything Model (SAM) ViT-B through a cross-attention fusion mechanism. The TCGA-LGG MRI Segmentation dataset (Kaggle release by Buda et al.; n = 110 patients; WHO grade II-III) was split at the patient level (train/val/test = 71/17/22) using three different random seeds, and the three slices with the largest tumor area were extracted from each patient. To avoid pseudo-replication arising from multiple slices per patient and repeated measurements across seeds, our primary analysis aggregated Dice and 95th-percentile Hausdorff distance (HD95) to the patient x seed unit (n = 66); secondary analyses at the unique-patient level (n = 22) and at the per-slice level (n = 198) are also reported. Pairwise comparisons used paired t-tests with Bonferroni correction (k = 3) and Wilcoxon signed-rank tests, and a permutation test (K = 30) served as an auxiliary check of effective use of the auxiliary information. Results. At the patient x seed level (n = 66), Proposed (full clinical) achieved a Dice gain of +0.287 over the zero-shot SAM ViT-B baseline (paired-t p = 4.2 x 10^-15, Cohen's d_z = +1.25, Bonferroni-corrected p << 0.001; Wilcoxon p = 2.0 x 10^-10), and HD95 improved from 218.2 to 64.6. Because zero-shot SAM is not designed for domain-specific medical segmentation, the large absolute HD95 gap largely reflects the expected domain gap rather than a competitive baseline. The additional contribution of the full clinical configuration over the demographics-only configuration was Dice = +0.023 (paired-t p = 0.057, Bonferroni-corrected p = 0.172), which did not reach statistical significance at the patient level and is reported as a directional trend. The permutation test (K = 30, seed 2025) yielded real-metadata Dice = 0.819 versus a shuffled-metadata mean of 0.773, giving an empirical p = 0.032 = 1/(K + 1), which is at the resolution limit of this test and should therefore be interpreted as preliminary evidence. Conclusions. Integrating auxiliary clinical information as multimodal prompts produced a large improvement over the zero-shot SAM baseline on this LGG cohort. More importantly, a robustness analysis showed that Proposed (full clinical) outperformed the trained Base (no auxiliary information) under all tested spatial-prompt conditions, including perfect centroid (+0.014), and that the advantage was most pronounced in the prompt-free regime (+0.231, p = 0.039), where the base model collapsed but the proposed model maintained meaningful segmentation by leveraging clinical metadata alone. The additional contribution of molecular and histopathological information beyond demographics was not statistically resolved at the patient level (+0.023, n.s.). Establishing clinical utility will require external validation on larger multi-center cohorts and direct comparisons with established segmentation methods. Keywords: brain tumor segmentation; Segment Anything Model (SAM); vision-language prompt-driven segmentation; auxiliary clinical prompts; multimodal learning; TCGA-LGG; deep learning
Mboya, G. O.
Show abstract
Machine learning models trained on observational data from one environment frequently fail when deployed in another, because standard learning algorithms exploit spurious correlations alongside causal ones. Invariant learning methods address this problem by seeking representations that support stable prediction across training environments, but their behavior on tabular data remains poorly characterized. We present CausTab, a gradient variance regularization framework for causal invariant representation learning on mixed tabular data. CausTab penalizes the variance of parameter gradients across training environments, providing a richer invariance signal than the scalar penalty used by Invariant Risk Minimization (IRM). We provide formal results showing that the gradient variance penalty is zero at causally invariant solutions and positive at solutions that rely on spurious features. Through experiments on synthetic data across three spurious-correlation regimes, four cycles of the National Health and Nutrition Examination Survey (NHANES), and four hospital systems in the UCI Heart Disease dataset, we demonstrate that: (1) IRM consistently degrades relative to standard empirical risk minimization (ERM) on tabular data, losing up to 13.8 AUC points in spurious-dominant settings, a failure we trace mechanistically to penalty collapse during training; (2) CausTab matches or exceeds ERM in every experimental condition; (3) CausTab achieves consistently better probability calibration than both ERM and IRM; and (4) invariant learning methods fail when environments differ in outcome prevalence rather than in spurious feature correlations, a boundary condition we characterize both empirically and theoretically. We introduce the Spurious Dominance Index (SDI), a practical scalar diagnostic for determining whether a dataset requires invariant learning, and validate it across all experimental settings